Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

Numeric9

Alerts

Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
Glucose is highly correlated with Insulin and 1 other fieldsHigh correlation
BloodPressure is highly correlated with BMIHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
Insulin is highly correlated with GlucoseHigh correlation
BMI is highly correlated with BloodPressure and 2 other fieldsHigh correlation
DiabetesPedigreeFunction is highly correlated with BMIHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Outcome is highly correlated with GlucoseHigh correlation
Pregnancies has 111 (14.5%) zeros Zeros
Outcome has 500 (65.1%) zeros Zeros

Reproduction

Analysis started2022-10-06 18:38:57.780434
Analysis finished2022-10-06 18:39:36.484501
Duration38.7 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.845052083
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:36.689194image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.369578063
Coefficient of variation (CV)0.8763413316
Kurtosis0.1592197775
Mean3.845052083
Median Absolute Deviation (MAD)2
Skewness0.9016739792
Sum2953
Variance11.35405632
MonotonicityNot monotonic
2022-10-06T19:39:37.099661image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1135
17.6%
0111
14.5%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
Other values (7)58
7.6%
ValueCountFrequency (%)
0111
14.5%
1135
17.6%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
ValueCountFrequency (%)
171
 
0.1%
151
 
0.1%
142
 
0.3%
1310
 
1.3%
129
 
1.2%
1111
 
1.4%
1024
3.1%
928
3.6%
838
4.9%
745
5.9%

Glucose
Real number (ℝ≥0)

HIGH CORRELATION

Distinct136
Distinct (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean121.6867628
Minimum44
Maximum199
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:37.728935image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile80
Q199.75
median117
Q3140.25
95-th percentile181
Maximum199
Range155
Interquartile range (IQR)40.5

Descriptive statistics

Standard deviation30.43594887
Coefficient of variation (CV)0.2501171711
Kurtosis-0.2591586041
Mean121.6867628
Median Absolute Deviation (MAD)20
Skewness0.53271866
Sum93455.43381
Variance926.3469834
MonotonicityNot monotonic
2022-10-06T19:39:38.278107image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9917
 
2.2%
10017
 
2.2%
11114
 
1.8%
12914
 
1.8%
12514
 
1.8%
10614
 
1.8%
11213
 
1.7%
10813
 
1.7%
9513
 
1.7%
10513
 
1.7%
Other values (126)626
81.5%
ValueCountFrequency (%)
441
 
0.1%
561
 
0.1%
572
0.3%
611
 
0.1%
621
 
0.1%
651
 
0.1%
671
 
0.1%
683
0.4%
714
0.5%
721
 
0.1%
ValueCountFrequency (%)
1991
 
0.1%
1981
 
0.1%
1974
0.5%
1963
0.4%
1952
0.3%
1943
0.4%
1932
0.3%
1911
 
0.1%
1901
 
0.1%
1894
0.5%

BloodPressure
Real number (ℝ≥0)

HIGH CORRELATION

Distinct47
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.40518417
Minimum24
Maximum122
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:38.851558image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile52
Q164
median72.20259209
Q380
95-th percentile90
Maximum122
Range98
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.09634618
Coefficient of variation (CV)0.1670646422
Kurtosis1.097783722
Mean72.40518417
Median Absolute Deviation (MAD)7.797407913
Skewness0.1373053674
Sum55607.18145
Variance146.321591
MonotonicityNot monotonic
2022-10-06T19:39:39.997709image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
7057
 
7.4%
7452
 
6.8%
7845
 
5.9%
6845
 
5.9%
7244
 
5.7%
6443
 
5.6%
8040
 
5.2%
7639
 
5.1%
6037
 
4.8%
72.4051841735
 
4.6%
Other values (37)331
43.1%
ValueCountFrequency (%)
241
 
0.1%
302
 
0.3%
381
 
0.1%
401
 
0.1%
444
 
0.5%
462
 
0.3%
485
 
0.7%
5013
1.7%
5211
1.4%
5411
1.4%
ValueCountFrequency (%)
1221
 
0.1%
1141
 
0.1%
1103
0.4%
1082
0.3%
1063
0.4%
1042
0.3%
1021
 
0.1%
1003
0.4%
983
0.4%
964
0.5%

SkinThickness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct50
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.06247353
Minimum7
Maximum63
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:40.504187image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile14.35
Q125
median29.15341959
Q332
95-th percentile44
Maximum63
Range56
Interquartile range (IQR)7

Descriptive statistics

Standard deviation8.420915872
Coefficient of variation (CV)0.2897522079
Kurtosis0.7860444004
Mean29.06247353
Median Absolute Deviation (MAD)3.846580407
Skewness0.2219494157
Sum22319.97967
Variance70.91182413
MonotonicityNot monotonic
2022-10-06T19:39:40.958814image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29.15341959228
29.7%
3231
 
4.0%
3027
 
3.5%
2723
 
3.0%
2322
 
2.9%
3320
 
2.6%
2820
 
2.6%
1820
 
2.6%
3119
 
2.5%
1918
 
2.3%
Other values (40)340
44.3%
ValueCountFrequency (%)
72
 
0.3%
82
 
0.3%
105
 
0.7%
116
0.8%
127
0.9%
1311
1.4%
146
0.8%
1514
1.8%
166
0.8%
1714
1.8%
ValueCountFrequency (%)
631
 
0.1%
601
 
0.1%
561
 
0.1%
542
0.3%
522
0.3%
511
 
0.1%
503
0.4%
493
0.4%
484
0.5%
474
0.5%

Insulin
Real number (ℝ≥0)

HIGH CORRELATION

Distinct186
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean155.5482234
Minimum14
Maximum846
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:41.760462image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile50
Q1121.5
median155.5482234
Q3155.5482234
95-th percentile293
Maximum846
Range832
Interquartile range (IQR)34.04822335

Descriptive statistics

Standard deviation85.02110777
Coefficient of variation (CV)0.5465900281
Kurtosis15.18523275
Mean155.5482234
Median Absolute Deviation (MAD)3.5
Skewness3.019083661
Sum119461.0355
Variance7228.588766
MonotonicityNot monotonic
2022-10-06T19:39:42.668190image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
155.5482234374
48.7%
10511
 
1.4%
1309
 
1.2%
1409
 
1.2%
1208
 
1.0%
947
 
0.9%
1807
 
0.9%
1007
 
0.9%
1356
 
0.8%
1156
 
0.8%
Other values (176)324
42.2%
ValueCountFrequency (%)
141
 
0.1%
151
 
0.1%
161
 
0.1%
182
0.3%
221
 
0.1%
232
0.3%
251
 
0.1%
291
 
0.1%
321
 
0.1%
363
0.4%
ValueCountFrequency (%)
8461
0.1%
7441
0.1%
6801
0.1%
6001
0.1%
5791
0.1%
5451
0.1%
5431
0.1%
5401
0.1%
5101
0.1%
4952
0.3%

BMI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct248
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.45746367
Minimum18.2
Maximum67.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:43.081523image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile22.235
Q127.5
median32.4
Q336.6
95-th percentile44.395
Maximum67.1
Range48.9
Interquartile range (IQR)9.1

Descriptive statistics

Standard deviation6.875151328
Coefficient of variation (CV)0.2118203504
Kurtosis0.91949026
Mean32.45746367
Median Absolute Deviation (MAD)4.6
Skewness0.5982526551
Sum24927.3321
Variance47.26770578
MonotonicityNot monotonic
2022-10-06T19:39:43.613227image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3213
 
1.7%
31.612
 
1.6%
31.212
 
1.6%
32.4574636711
 
1.4%
32.410
 
1.3%
33.310
 
1.3%
30.19
 
1.2%
32.89
 
1.2%
32.99
 
1.2%
30.89
 
1.2%
Other values (238)664
86.5%
ValueCountFrequency (%)
18.23
0.4%
18.41
 
0.1%
19.11
 
0.1%
19.31
 
0.1%
19.41
 
0.1%
19.52
0.3%
19.63
0.4%
19.91
 
0.1%
201
 
0.1%
20.11
 
0.1%
ValueCountFrequency (%)
67.11
0.1%
59.41
0.1%
57.31
0.1%
551
0.1%
53.21
0.1%
52.91
0.1%
52.32
0.3%
501
0.1%
49.71
0.1%
49.61
0.1%

DiabetesPedigreeFunction
Real number (ℝ≥0)

HIGH CORRELATION

Distinct517
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4718763021
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:44.187511image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.331328595
Coefficient of variation (CV)0.7021513764
Kurtosis5.594953528
Mean0.4718763021
Median Absolute Deviation (MAD)0.1675
Skewness1.919911066
Sum362.401
Variance0.1097786379
MonotonicityNot monotonic
2022-10-06T19:39:44.706656image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2586
 
0.8%
0.2546
 
0.8%
0.2685
 
0.7%
0.2075
 
0.7%
0.2615
 
0.7%
0.2595
 
0.7%
0.2385
 
0.7%
0.194
 
0.5%
0.2634
 
0.5%
0.2994
 
0.5%
Other values (507)719
93.6%
ValueCountFrequency (%)
0.0781
0.1%
0.0841
0.1%
0.0852
0.3%
0.0882
0.3%
0.0891
0.1%
0.0921
0.1%
0.0961
0.1%
0.11
0.1%
0.1011
0.1%
0.1021
0.1%
ValueCountFrequency (%)
2.421
0.1%
2.3291
0.1%
2.2881
0.1%
2.1371
0.1%
1.8931
0.1%
1.7811
0.1%
1.7311
0.1%
1.6991
0.1%
1.6981
0.1%
1.61
0.1%

Age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.24088542
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:45.471656image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76023154
Coefficient of variation (CV)0.3537881556
Kurtosis0.6431588885
Mean33.24088542
Median Absolute Deviation (MAD)7
Skewness1.129596701
Sum25529
Variance138.3030459
MonotonicityNot monotonic
2022-10-06T19:39:46.065729image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2272
 
9.4%
2163
 
8.2%
2548
 
6.2%
2446
 
6.0%
2338
 
4.9%
2835
 
4.6%
2633
 
4.3%
2732
 
4.2%
2929
 
3.8%
3124
 
3.1%
Other values (42)348
45.3%
ValueCountFrequency (%)
2163
8.2%
2272
9.4%
2338
4.9%
2446
6.0%
2548
6.2%
2633
4.3%
2732
4.2%
2835
4.6%
2929
3.8%
3021
 
2.7%
ValueCountFrequency (%)
811
 
0.1%
721
 
0.1%
701
 
0.1%
692
0.3%
681
 
0.1%
673
0.4%
664
0.5%
653
0.4%
641
 
0.1%
634
0.5%

Outcome
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3489583333
Minimum0
Maximum1
Zeros500
Zeros (%)65.1%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-10-06T19:39:46.443712image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.4769513772
Coefficient of variation (CV)1.366786036
Kurtosis-1.600929755
Mean0.3489583333
Median Absolute Deviation (MAD)0
Skewness0.6350166434
Sum268
Variance0.2274826163
MonotonicityNot monotonic
2022-10-06T19:39:46.863881image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Histogram with fixed size bins (bins=2)
ValueCountFrequency (%)
0500
65.1%
1268
34.9%
ValueCountFrequency (%)
0500
65.1%
1268
34.9%
ValueCountFrequency (%)
1268
34.9%
0500
65.1%

Interactions

2022-10-06T19:39:30.162783image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:38:58.337085image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:03.301211image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:07.668568image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:11.359011image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:15.070293image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:19.115675image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:23.224327image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:25.731200image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:30.716277image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:38:59.085435image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:03.965327image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:08.047321image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:11.796908image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:15.459429image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:19.503813image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:23.502210image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:26.502891image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:31.288998image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:38:59.634069image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:04.689655image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:08.430326image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:12.186181image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:15.882348image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:20.054404image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:23.707915image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:27.130929image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:31.780828image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:00.145919image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:05.349971image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:08.780744image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:12.563289image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:16.253634image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:20.751661image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:23.891237image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:27.549636image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:32.213902image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:00.680341image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:05.778388image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:09.030613image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:13.070186image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:16.575371image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:21.045406image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:24.078307image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:28.222691image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:32.598136image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:01.123720image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:06.118711image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:09.315339image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:13.480944image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:16.946577image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:21.954117image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:24.250340image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:28.645469image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:33.176084image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:01.574691image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:06.621848image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:09.683858image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:13.829301image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:17.473762image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:22.463778image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:24.450917image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:29.011483image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:34.007602image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:02.159447image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:06.980534image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:10.076741image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:14.193855image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:18.197657image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:22.831546image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:24.748506image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:29.276189image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:34.719347image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:02.717138image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:07.355834image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:10.722070image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:14.651383image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:18.695551image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:23.024902image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:25.227087image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
2022-10-06T19:39:29.674019image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Correlations

2022-10-06T19:39:47.331707image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-06T19:39:48.104114image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-06T19:39:48.763865image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-06T19:39:49.262371image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-06T19:39:35.241474image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-06T19:39:36.204177image/svg+xmlMatplotlib v3.6.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
06148.072.00000035.00000155.54822333.6000000.627501
1185.066.00000029.00000155.54822326.6000000.351310
28183.064.00000029.15342155.54822323.3000000.672321
3189.066.00000023.0000094.00000028.1000000.167210
40137.040.00000035.00000168.00000043.1000002.288331
55116.074.00000029.15342155.54822325.6000000.201300
6378.050.00000032.0000088.00000031.0000000.248261
710115.072.40518429.15342155.54822335.3000000.134290
82197.070.00000045.00000543.00000030.5000000.158531
98125.096.00000029.15342155.54822332.4574640.232541

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
7581106.076.029.15342155.54822337.50.197260
7596190.092.029.15342155.54822335.50.278661
760288.058.026.0000016.00000028.40.766220
7619170.074.031.00000155.54822344.00.403431
762989.062.029.15342155.54822322.50.142330
76310101.076.048.00000180.00000032.90.171630
7642122.070.027.00000155.54822336.80.340270
7655121.072.023.00000112.00000026.20.245300
7661126.060.029.15342155.54822330.10.349471
767193.070.031.00000155.54822330.40.315230